Goto

Collaborating Authors

 command sequence


MamTiff-CAD: Multi-Scale Latent Diffusion with Mamba+ for Complex Parametric Sequence

Deng, Liyuan, Bai, Yunpeng, Dai, Yongkang, Huang, Xiaoshui, Gan, Hongping, Huang, Dongshuo, jiacheng, Hao, Shi, Yilei

arXiv.org Artificial Intelligence

Parametric Computer-Aided Design (CAD) is crucial in industrial applications, yet existing approaches often struggle to generate long sequence parametric commands due to complex CAD models' geometric and topological constraints. To address this challenge, we propose MamTiff-CAD, a novel CAD parametric command sequences generation framework that leverages a Transformer-based diffusion model for multi-scale latent representations. Specifically, we design a novel autoencoder that integrates Mamba+ and Transformer, to transfer parameterized CAD sequences into latent representations. The Mamba+ block incorporates a forget gate mechanism to effectively capture long-range dependencies. The non-autoregressive Transformer decoder reconstructs the latent representations. A diffusion model based on multi-scale Transformer is then trained on these latent embeddings to learn the distribution of long sequence commands. In addition, we also construct a dataset that consists of long parametric sequences, which is up to 256 commands for a single CAD model. Experiments demonstrate that MamTiff-CAD achieves state-of-the-art performance on both reconstruction and generation tasks, confirming its effectiveness for long sequence (60-256) CAD model generation.


Context-Enhanced Granular Edit Representation for Efficient and Accurate ASR Post-editing

Vejsiu, Luan, Zheng, Qianyu, Chen, Haoxuan, Han, Yizhou

arXiv.org Artificial Intelligence

Despite ASR technology being full-scale adopted by industry and for large portions of the population, ASR systems often have errors that require editors to post-edit text quality. While LLMs are powerful post-editing tools, baseline full rewrite models have inference inefficiencies because they often generate the same redundant text over and over again. Compact edit representations have existed but often lack the efficacy and context required for optimal accuracy. This paper introduces CEGER (Context-Enhanced Granular Edit Representation), a compact edit representation that was generated for highly accurate, efficient ASR post-editing. CEGER allows LLMs to generate a sequence of structured, fine-grained, contextually rich commands to modify the original ASR output. A separate expansion module deterministically reconstructs the corrected text based on the commands. Extensive experiments on the LibriSpeech dataset that were conducted, CEGER achieves state-of-the-art accuracy, achieving the lowest word error rate (WER) versus full rewrite and prior compact representations.


Text-to-CadQuery: A New Paradigm for CAD Generation with Scalable Large Model Capabilities

Xie, Haoyang, Ju, Feng

arXiv.org Artificial Intelligence

Computer-aided design (CAD) is fundamental to modern engineering and manufacturing, but creating CAD models still requires expert knowledge and specialized software. Recent advances in large language models (LLMs) open up the possibility of generative CAD, where natural language is directly translated into parametric 3D models. However, most existing methods generate task-specific command sequences that pretrained models cannot directly handle. These sequences must be converted into CAD representations such as CAD vectors before a 3D model can be produced, which requires training models from scratch and adds unnecessary complexity. To tackle this issue, we propose generating CadQuery code directly from text, leveraging the strengths of pretrained LLMs to produce 3D models without intermediate representations, using this Python-based scripting language. Since LLMs already excel at Python generation and spatial reasoning, fine-tuning them on Text-to-CadQuery data proves highly effective. Given that these capabilities typically improve with scale, we hypothesize that larger models will perform better after fine-tuning. To enable this, we augment the Text2CAD dataset with 170,000 CadQuery annotations. We fine-tune six open-source LLMs of varying sizes and observe consistent improvements. Our best model achieves a top-1 exact match of 69.3%, up from 58.8%, and reduces Chamfer Distance by 48.6%. Project page: https://github.com/Text-to-CadQuery/Text-to-CadQuery.


RLCAD: Reinforcement Learning Training Gym for Revolution Involved CAD Command Sequence Generation

Yin, Xiaolong, Lu, Xingyu, Shen, Jiahang, Ni, Jingzhe, Li, Hailong, Tong, Ruofeng, Tang, Min, Du, Peng

arXiv.org Artificial Intelligence

A CAD command sequence is a typical parametric design paradigm in 3D CAD systems where a model is constructed by overlaying 2D sketches with operations such as extrusion, revolution, and Boolean operations. Although there is growing academic interest in the automatic generation of command sequences, existing methods and datasets only support operations such as 2D sketching, extrusion,and Boolean operations. This limitation makes it challenging to represent more complex geometries. In this paper, we present a reinforcement learning (RL) training environment (gym) built on a CAD geometric engine. Given an input boundary representation (B-Rep) geometry, the policy network in the RL algorithm generates an action. This action, along with previously generated actions, is processed within the gym to produce the corresponding CAD geometry, which is then fed back into the policy network. The rewards, determined by the difference between the generated and target geometries within the gym, are used to update the RL network. Our method supports operations beyond sketches, Boolean, and extrusion, including revolution operations. With this training gym, we achieve state-of-the-art (SOTA) quality in generating command sequences from B-Rep geometries. In addition, our method can significantly improve the efficiency of command sequence generation by a factor of 39X compared with the previous training gym.


Stadium card stunts and the art of programming a crowd

Engadget

With college bowl season just around the corner, football fans across the nation will be dazzled, not just by the on-field action, but also by the intricate "card stunts" performed by members of the stadium's audience. The highly-coordinated crowd work is capable of producing detailed images that resemble the pixelated images on computer screens -- and which are coded in much the same manner. Michael Littman's new book, Code to Joy: Why Everyone Should Learn a Little Programming, is filled with similar examples of how the machines around us operate and how we need not distrust an automaton-filled future so long as we learn to speak their language (at least until they finish learning ours). From sequencing commands to storing variables, Code to Joy provides an accessible and entertaining guide to the very basics of programming for fledgling coders of all ages. Card stunts, in which a stadium audience holds up colored signs to make a giant, temporary billboard, are like flash mobs where the participants don't need any special skills and don't even have to practice ahead of time.

  command sequence, instruction, sequence, (17 more...)
  Industry: Leisure & Entertainment > Sports (0.34)

SimCURL: Simple Contrastive User Representation Learning from Command Sequences

Chu, Hang, Khasahmadi, Amir Hosein, Willis, Karl D. D., Anderson, Fraser, Mao, Yaoli, Tran, Linh, Matejka, Justin, Vermeulen, Jo

arXiv.org Artificial Intelligence

User modeling is crucial to understanding user behavior and essential for improving user experience and personalized recommendations. When users interact with software, vast amounts of command sequences are generated through logging and analytics systems. These command sequences contain clues to the users' goals and intents. However, these data modalities are highly unstructured and unlabeled, making it difficult for standard predictive systems to learn from. We propose SimCURL, a simple yet effective contrastive self-supervised deep learning framework that learns user representation from unlabeled command sequences. Our method introduces a user-session network architecture, as well as session dropout as a novel way of data augmentation. We train and evaluate our method on a real-world command sequence dataset of more than half a billion commands. Our method shows significant improvement over existing methods when the learned representation is transferred to downstream tasks such as experience and expertise classification.


Learning Device Models with Recurrent Neural Networks

Clemens, John

arXiv.org Machine Learning

In this paper we consider whether RNNs can learn functionally equivalent models of unknown computer hardware peripherals through input/output observation. Peripheral devices attach to a main computer and use both hardware within the device and driver software running on the main computer to perform a task, such as printing a page or sending a message. However, there are instances when hardware is accessible from the main system but driver software is not, rendering the peripheral unusable. This situation is prevalent in open source operating systems where driver software may not be available from the vendor. Without driver software or development documentation, it is incumbent on the system's owner to write software to make use of the peripheral. The device itself is a "black box", with no information directly available to the developer beyond a set of memory addresses to interact with the device and the observable output of the hardware itself. This leads to labor-intensive reverse engineering efforts with varying degrees of success (see e.g.